Search CORE

35 research outputs found

Trie-Compressed Adaptive Set Intersection

Author: Arroyuelo Diego
Castillo Juan Pablo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

We introduce space- and time-efficient algorithms and data structures for the offline set intersection problem. We show that a sorted integer set S ? [0..u) of n elements can be represented using compressed space while supporting k-way intersections in adaptive O(k?lg(u/?)) time, ? being the alternation measure introduced by Barbay and Kenyon. Our experimental results suggest that our approaches are competitive in practice, outperforming the most efficient alternatives (Partitioned Elias-Fano indexes, Roaring Bitmaps, and Recursive Universe Partitioning (RUP)) in several scenarios, offering in general relevant space-time trade-offs

Dagstuhl Research Online Publication Server

Fully dynamic and memory-adaptative spatial approximation trees

Author: Arroyuelo Diego
Navarro Gonzalo
Reyes Nora Susana
Publication venue
Publication date: 01/10/2003
Field of study

Hybrid dynamic spatial approximation trees are recently proposed data structures for searching in metric spaces, based on combining the concepts of spatial approximation and pivot based algorithms. These data structures are hybrid schemes, with the full features of dynamic spatial approximation trees and able of using the available memory to improve the query time. It has been shown that they compare favorably against alternative data structures in spaces of medium difficulty. In this paper we complete and improve hybrid dynamic spatial approximation trees, by presenting a new search alternative, an algorithm to remove objects from the tree, and an improved way of managing the available memory. The result is a fully dynamic and optimized data structure for similarity searching in metric spaces.Eje: Teoría (TEOR)Red de Universidades con Carreras en Informática (RedUNCI

Distributed search based on self-indexed compressed text

Author: Arroyuelo Diego
Gil Costa Graciela Verónica
González Senén
Marin Mauricio
Oyarzún Mauricio
Publication venue: Pergamon-Elsevier Science Ltd
Publication date: 01/03/2012
Field of study

Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e., an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.Fil: Arroyuelo, Diego. No especifíca;Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis; ArgentinaFil: González, Senén. No especifíca;Fil: Marin, Mauricio. Universidad de Santiago de Chile; ChileFil: Oyarzún, Mauricio. Universidad de Santiago de Chile; Chil

CONICET Digital

Fully dynamic and memory-adaptative spatial approximation trees

Author: Depto De Informática
Diego Arroyuelo
Gonzalo Navarro
Nora Reyes
Publication venue
Publication date: 05/04/2012
Field of study

CiteSeerX

Servicio de Difusión de la Creación Intelectual

Bases de datos no convencionales

Author: Arroyuelo Diego
Ludueña Verónica
Navarro Gonzalo
Reyes Nora Susana
Publication venue
Publication date: 20/09/2012
Field of study

Con la evolución de las tecnologías de información y comunicación, han surgido almacenamientos no estructurados de información. No sólo se consultan nuevos tipos de datos tales como texto libre, imágenes, audio y video; sino que además, en algunos casos, ya no se puede estructurar más la información en claves y registros. Aún cuando sea posible una estructuración clásica, nuevas aplicaciones tales como la minería de datos requieren acceder a la base de datos por cualquier campo y no sólo por aquellos marcados como “claves”. Los escenarios anteriores requieren modelos más generales tales como bases de datos de texto o espacios métricos, entre otros; y contar con herramientas que permitan realizar búsquedas eficientes sobre estos tipos de datos. Las técnicas que emergen desde estos campos muestran un área de investigación propicia para el desarrollo de herramientas que resuelvan eficientemente los problemas involucrados en la administración de bases de datos no convencionales.Eje: Base de datosRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Fast in-memory XPath search using compressed indexes

Author: Arroyuelo Diego
Claude Francisco
Maneth Sebastian
Mäkinen Veli
Navarro Gonzalo
Nguyen Kim
Sirén Jouni
Välimäki Niko
Publication venue: IEEE Computer Society
Publication date: 01/01/2010
Field of study

A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced. It stores the tree structure of an XML document using a bit array of opening and closing brackets plus a sequence of labels, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.Peer reviewe

CiteSeerX

Helsingin yliopiston digitaalinen arkisto

Bases de datos no convencionales

Author: Arroyuelo Diego
Ludueña Verónica
Navarro Gonzalo
Reyes Nora Susana
Publication venue
Publication date: 01/05/2004
Field of study

Managing Compressed Structured Text

Author: Diego Arroyuelo
G Gottlob
G Navarro
Gonzalo Navarro
Gonzalo Navarro
J Barbay
M Lohrey
M Lohrey
NR Brisaboa
NR Brisaboa
P Ferragina
Paolo Ferragina
R Baeza-Yates
S Sakr
V Mäkinen
Publication venue: Springer Nature
Publication date: 07/12/2018
Field of study

[Definition]: Compressing structured text is the problem of creating a reduced-space representation from which the original data can be re-created exactly. Compared to plain text compression, the goal is to take advantage of the structural properties of the data. A more ambitious goal is that of being able of manipulating this text in compressed form, without decompressing it. This entry focuses on compressing, navigating, and searching structured text, as those are the areas where more advances have been made

Repositorio da Universidade da Coruña

Crossref